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DIRNET 


1. The DIR net. 

This document describes the DIR net [1], a distributed environment which is part of the EFTOS fault 
tolerance framework [2]. The DIR net is a system consisting of two components, called DIR Manager (or, 
shortly, the manager) and DIR Backup Agent (shortly, the backup). One manager and a set of backups is 
located in the system to be ‘guarded’, one component per node. At this point the DIR net weaves a web 
which substantially does two things: 

• makes itself tolerant to a number of possible faults, and 

• gathers information pertaining the run of the user application. 

As soon as an error occurs within the DIR net, the system executes built-in recovery actions that allow itself 
to continue processing despite a number of hardware/software faults, possibly doing a graceful degradation of 
its features; when an error occurs in the user application, the DIR net, by means of custom- and user-defined 
detection tools, is informed of such events and runs one or more recovery strategies, both built-in and coded 
by the user using an ancillary compile-time tool, the rl translator [3]. Such tools translates the user-defined 
strategies into a binary “R-code”, i.e., a pseudo-code interpreted by a special component of the DIR net, the 
Recovery Interpreter, rint (in a sense, rint is a r-code virtual machine.) 

This document describes the generic component of the DIR net, a function which can behave either as 
manager or as backup. 

2. The first mailbox-id available to the DIR-net 
#define DIR_MB0X_0FFSET 20 

3. Same as above, but for Alias-id’s 
#define DIR_ALIAS_OFFSET 20 
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4. On each node n € [0,MAX_PR0CS[ a DIR net component is to run. This component can be addresses 
internally by means of mailbox MBOX(n) and externally (from other nodes) by means of alias ALIAS (n). 

#define MBOX(i) DIR_MB0X_0FFSET 

#define IAT_MB0X DIR_MBDX_DFFSET + 1 

#define RINT_MB0X DIR_MB0X_0FFSET + 2 

#define DB_MB0X DIR_MBDX_DFFSET + 3 

#define T0M_MB0X DIR_MBDX_OFFSET + 4 

#define ALIAS (i) DIR_ALIAS_OFFSET 

#define IAT_ALIAS IAT_MB0X 

#define IA_FLAG_TIMEOUT 10 

#define IA_FLAG_CYCLIC TOM_CYCLIC 

#define IA_FLAG_DEADLINE IMALIVE_CLEAR_TIMEOUT 

#define MIA_TIMEOUT 15 

#define MIA_CYCLIC TOM_CYCLIC 

#define MIA_DEADLINE MIA_SEND_TIMEOUT 

#define TAIA_TIMEOUT 20 

#define TAIA_CYCLIC T0M_CYCLIC 

#define TAIA_DEADLINE TAIA_RECV_TIMEOUT 

#define TEIF_TIME0UT 30 

#define TEIF_CYCLIC T0M_N0N_CYCLIC 

#define TEIF_DEADLINE IMALIVE_SET_TIMEOUT 

#define IA_FLAG_TIMEOUT_B 50 

#define IA_FLAG_CYCLIC_B TDM_CYCLIC 

#define IA_FLAG_DEADLINE_B IMALIVE_CLEAR_TIMEDUT 

#define MIA_TIMEOUT_B 55 

#define MIA_CYCLIC_B TOM_CYCLIC 

#define MIA_DEADLINE_B MIA_RECV_TIMEOUT 

#define TAIA_TIMEOUT_B 60 

#define TAIA_CYCLIC_B TDM_CYCLIC 

#define TAIA_DEADLINE_B TAIA_SEND_TIME0UT 

#define TEIF_TIME0UT_B 70 

#define TEIF_CYCLIC_B T0M_NDN_CYCLIC 

#define TEIF_DEADLINE_B IMALIVE_SET_TIME0UT 

#define IAT_TIME0UT 40 

#define IAT_CYCLIC T0M_CYCLIC 

#define IAT_DEADLINE IMALIVE_SET_TIME0UT 

#define INJECT_FAULT_TIME0UT 6 

#define INJECT_FAULT_DEADLINE 6000000 /* 6 secs */ 

(Global Variables and # include’s 5) 

( Generic component of the DIR net 6) 

(Alarm function 52 ) 

(I’m Alive Task 61) 

( GetState and SetState 56) 

(TEX routines simulated on EPX 59) 

(DIR Print Message 60) 
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5. We need to include a number of header files, i.e., those pertaining the timeout manager, ... 

(Global Variables and # include’s 5) = 

:j(f!:include <stdio.h> 

:j(f!:include <stdlib.h> 

:j(^include <sys/root.h> 

:j(^include <sys/logerror. h> 

:j(^include <sys/link.h> 

^include <sys/select.h> 

:j(^include <sys/time.h> 

^include <sys/thread.h> 

:j(^:include <sys/sem.h> 

:j(^:include <string.h> 

:y(^:include "tom.h" 

^^include "timeouts.h" 

^^include "dirdefs.h" 

#ifdef EPX1_2 
typedef int IDF; 

#endif 

:;(^:include "dirtypes.h" 

:;(^:include "rcode.h" 

:;(^:include "trl.h" /* TEX-specific includes types */ 

#ifdef TEX 
^include <links.h> 

:;(f!:include <remote_mbox.h> 

:;(f!:include <thread.h> 

^else 

int TEXFirstActivation = 1; 
typedef int Alias.t; 
typedef int IDF; 

#deflne MSG_0K 0 
#deflne INFINITE 0 
:;(f!:endif 
#ifdef EPX1_2 
:;(f!:include <sys/rrouter. h> 

int RemoteSendMessage (int, int, char *, int); 
int TEXSendMessage{int, char *,int); 
int TEXReceiveMessage{int,char *,int *,int); 
int GetRoot (void); 

^endit 

int TEXGetState ( statusJ, * ) ; int TEXSetState ( statusJ, * ) ; 

^ifndef EPX 

int Export (Alias_t, IDF); 

STATUS TEXGetTaskStatus (IDF); 

:^endif 

int TEXGetNumTasks (void); 
int TEXStopTask {void); 
int TEXRestartTask {void); 
int flag = 0; 
char *role2ascii {int); 
char *DIRPrintTimeout{int); 
char *DIRPrintMessage{int); 



§5 DIRNET 


THE DIR NET 5 


char *DIRPrintCode{int); int sendMmeout-message ( TOM * ) ; 

extern Semaphore-tsem, *tom_sem; 

^j^ifndef _TDM_H_ 

typedef struct { 

int running] 
int deadline] 

int (*a/ar?n) (struct TOM *); 
unsigned char id, suhid] 
unsigned char cyclic] 
unsigned char suspended] 

} timeout.t; 

typedef struct block_t { 
struct block_t *next] 
timeout_t timeout] 
unsigned char used] 

} block_t; 

typedef struct TOM { 
block_t *top] 
int tom-id] 
block_t *block-stack] 
int block-sp] 

int {*default-alarm) {struct TOM *)] 

LinkCB-t * link [2]; 

unsigned int starting-time] 

} TOM; 

typedef struct { 
timeout.t ^timeout] 
unsigned char code] 

} tom_message_t; 

:;(^:endif 

#ifndef __DIR_TYPES_ 
typedef struct { 
char configuration] 
int runlevel] 

} DIR_state_t; 
typedef struct { 
char primary] 
char role] 

} status_t; 

:;(^:endif 

:;(^:define MAXARG 5 
typedef struct { 
int [MAXARG]; 

int subid] 
int type] 
char local] 

} message.t; 
int lA-fiag] 
message.t message] 
extern DIR-db-tdb] 
int errors] 
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This code is used in section 4. 

6. On every node a DIRNetGenericComponent{) function is run. This function first runs a protocol to 
understand its role and who is the manager, and to build or re-build a global database; after this phase it 
runs either as a manager or as a backup agent. 

Merging the codes for the backup and the manager into one running component has two major benehts: 
(1) you need to foresee n replicas of this generic component, without specifying which role each replica has 
to play, and (2) a, lot of code is shared between the two components. 

(Generic component of the DIR net 6) = 
void DIRNetGenericComponent{'vo\.A) 

{ 

(Variables local to the Manager and the Backup 7) 

InitSem{tom-sem, 1); 

(DIR net initialisation 8) 

(Spawn the I’m Alive Task 69) 
if {role = DIR_MANAGER) { 

(DIR net manager 9) 

} 

else { 

(DIR net backup agent 33) 

} 

TEXStopTask (); 

} 

This code is used in section 4. 

7. These variables are shared between the Manager and the Backup Agent. 

(Variables local to the Manager and the Backup 7) = 
status_t mystate; 

ST kTUS status; 

int role, managerid; 

TOM *tom; 

timeout_t mia[MAX_PR0CS], taia[MAX_PR0CS], teif, ia, inject; 
char suspicion-period [MAX_PR0CS]; 
int n, sender; 

This code is used in section 6. 
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8. This function initialises the DIR net. It checks whether this node and this components are both “new” 
(a node is new if it has never been rebooted; a component is “new” if it is the original component of this 
node, i.e., a primary). If so, it gets the role from the state, as stored in the database, otherwise, it asks the 
neighbouring nodes who is the manager. This section also takes care of building or receiving the system 
database: if the node is new, then a local database is built and the global database is created by a broadcast 
session; otherwise, the global database is requested from a remote node. 

(DIR net initialisation 8) = 

TEXGetState {hmystate ); 

if [TEXFirstActivation A mystate.primary) { /* only two roles are possible - backup or manager */ 

{Read your role from the RL script 57); 
if {db.role = DIR_BACKUP) role = DIR_BACKUP; 
else role = DIR_MANAGER; 

LogError{EC_ERROR, "Generic", "roleyofu7«duisu7«s", GetRoot{),role2ascii{role)); 
fork{ ); I* export your main mailbox */ 

Export{k'LlkS>{GetRoot{)),¥iBQlk{GetRoot{)))] /* export your I’m Alive Task’s mailbox */ 

Export l{Alias_t) IAT_ALIAS, (IDF) IAT_MB0X); /* export the database mailbox */ 

Export {{AUas_t) DB_MB0X, (IDF) DB_MBDX); 

(Check who is the manager according to the RL script 58); 

} 

else { 

(send WITM to all 54) 

(wait for NMI messages to come 55) 

if {message.arg[0] = GetRoot{)) role = DIR_MANAGER; 

else role = DIR_BACKUP; 

managerid = message. arg[0]; 

} 

if {TEXFirstActivation) { 

int i; 

n = TEXGetNumTasks {); 

(store number of tasks (n); 44) 
for (i = 0; i < n; i++) { 

status = TEXGetTaskStatus {{IDF) i); 

{store in local database(i, status) 45) 

} 

(broadcast local database and receive the others’ databases 46) 

(build a global database 49) 

(mark as ‘reboot-resistant’ the whole database via DataReset{) 50) 

} 

else { 

(request a copy of the global database 5i) 

} 

(EC_MESS, "Generic", "globaludatabaseuDK"); 

This code is used in section 6. 
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9. The code for the manager of the DIR net. 

(DIR net manager 9) = 

{ 

managerid = GetRoot (); 

Lo^iJrror (EC_ERRDR, "Manager", "Managerystarts . . . "); /* the alarm of these timeouts simply 

sends a message of id <timeout-type>, subid = subid to the manager */ 

{insert timeouts (lA-flag-timeout, MIA-timeouts, TAIA-timeouts) 10) 

(clear the suspiciori-period [ ]’s 14) 

(clear lA-flag 16) 

LogError {EC_ERR0R, "Manager", "ManageryactivatesuIAT... 

(activate lAT 17) 

LogError {EC_ERR0R, "Manager", "ManageryloopuStarts... 

(manager loop (waiting for incoming messages) 18) 

} /* end manager */ 

This code is used in section 6. 

10. The manager initialises a set of timeout objects and inserts them into the timeout list [4] [5]. 

(insert timeouts (lA-flag-timeout, MIA-timeouts, TAIA-timeouts) 10) = 

{ 

tom = tom-init{sendJ,imeout-message)] 

(declare and insert MIA timeouts 11) 

(declare and insert the lA-flag timeout 12 ) 

(declare and insert TAIA timeouts 13) 

} 

This code is used in section 9. 

11. At most every MIA_DEADLINE ticks a “Manager Is Alive” (MIA) message needs to start towards a 
backup. 

(declare and insert MIA timeouts ll) = 

{ 

int i; 

for (z = 0; z < MAX_PR0CS; z-i-r) { 
if (z = GetRoot{)) continue; 

tom.declare{mia + i, MIA_CYCLIC, T0M_SET_ENABLE, MIA_TIME0UT, z, MIA_DEADLINE); 
tom-insert {tom, mia + z); 

} 

#ifdef INJECT 

tom-dec/are(&mject, T0M_NDN_CYCLIC, T0M_SET_ENABLE, INJECT_FAULT_TIMEDUT, z, 
INJECT_FAULT_DEADLINE); 
tom-insert {tom, hinject); 

^endif 

} 

This code is used in section 10. 
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12. Every IMALIVE_CLEAR_TIMEDUT ticks at most the lA-flag must be cleared, or the lAT will consider 
this component as crashed. This is accomplished by means of the following cyclic timeout: 

(declare and insert the lA-flag timeout 12 ) = 

{ 

tom_dedare{kia, IA_FLAG_CYCLIC, TOM_SET_ENABLE, IA_FLAG_TIMEOUT, 1, IMALIVE_CLEAR_TIMEOUT); 

tomJnsert{tom, kia); 

} 

This code is used in sections 10 and 34. 

13. At most every TAIA_DEADLINE ticks a “This Agent Is Alive” (TAIA) message coming from a backup 
needs to be received by the Manager. 

(declare and insert TAIA timeouts 13) = 

{ 

int i; 

for (i = 0; i< MAX_PR0CS; i++) { 
if {i = GetRoot{)) continue; 

tom_declare{taia + i, TAIA_CYCLIC, T0M_SET_ENABLE, TAIA_TIME0UT, i, TAIA_DEADLINE); 
tom-insert {tom, taia + i); 

} 

} 

This code is used in section 10. 

14. Suspicion periods are globally set to off 
(clear the suspicion-period [ ]’s 14) = 

{ 

int i; 

for (i = 0; i< MAX_PR0CS; i++) { 
suspicion-period [i] = 0; 

} 

} 

This code is used in section 9. 

15. This is the same as above, but for the backup agent. It only manages one suspicion period, viz. the 
one of the manager; let us choose 0 as the manager’s suspicion period. 

(clear suspicion-period 15) = 

* suspicion-period = 0; 

This code is used in section 33. 

16. lA-fiag is shared between this component and its I’m Alive Task. At initialisation time and at each 
new message of type IA_FLAG_TIME0UT this flag has to be cleared. 

(clear lA-flag 16) = 
lA-flag = 0; 

This code is used in sections 9, 18, 33, and 37. 
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17. This message wakes up the I’m Alive Task, which will start checking periodically whether the lA-flag 
has been cleared. (Note: each component knows that there are nProcs-1 fellows in the dirnet. It is assumed 
that fellow i gets mail on mailbox i, so the first nProcs-1 integers are reserved for this reason. Furthermore, 
each node shall export its mailbox via ExportGetRoot{), (IDF) GetRoot{ )).) Mailbox whose id 
and Alias is nProcs is assumed to be the I’m Alive Task, mailbox nProcsPl is the recovery thread. 

(activate lAT 17) = 
message .type = ROUSE; 
message, subid = GetRoot{); 

TEXSendMessage {lAT_KB0X, {char *) Szmessage, sizeoi {message)); 

This code is used in sections 9, 18, 33, and 37. 
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18. This loop is the real core of the manager. It has to deal with a number of messages coming from 
the timeout manager, its fellow backups, the recovery thread, the remote I’m Alive Tasks. The core of the 
fault-tolerant strategy of the DIR net is in here. 

(manager loop (waiting for incoming messages) 18) = 
while (1) { 

(wait for an incoming message 53) 
tom_dump(tom)] 
switch {message.type) { 
case INJECT_FAULT_TIMEOUT: 

LogError{EC_ERROR, "Manageruloop", "Faultuinjection"); 
tom-close{tom); j* the time-out manager is detached */ 

break; 

case IA_FLAG_TIMEDUT: 

Lo^Arror(EC_ERR0R,"Manageruloop", "IA_FLAG_TIMEOUTuniessageu->uClearuIA-flag."); 

/* time to clear the lA-flag! */ 

(clear lA-flag 16) 
break; 

case MIA_TIMEOUT: 

Log Error {EC _ERRGR^ "Manageruloop", 

"MlA_TlMEDUTuniessageu(timeutOusenduauMlAutOuBackupu7.d) . ", message.subid); 

/* time to send a MIA to a backup */ 

{send MIA to backup subid 19) 
tom-dump {tom + message.subid); 
tom-renew {tom, mia + message .subid); 
break; 
case DB; 

LogError{EC-ERRCR^ "Manageruloop", "DBuinessage."); /* a message which modifies the db */ 

if {message.subid = GetRoot{)) /* the message is local */ 

{ 

(renew MIA timeout on all subid' s 20) 

(update your copy of the db 22) 

(broadcast modifications to db 21 ) 

break; 

} 

else /* if it’s a remote message, it’s also a piggybacked TAIA */ 

{ 

(update your copy of the db 22) /* don’t break */ 

} /* don’t break */ 

case TAIA: 

LogError {EC-ERRCR^ "Manageruloop", "TAlAuinessageufromunodeu°/od. ", message ..subid); 

/* if a TAIA comes in or a remote DB message comes in... */ 
if {-<tom-ispresent{tom, taia + message.subid)) { 

(insert TAIA-timeout, subid=SM6fc? 23) 

(broadcast NlUA 24) /* node is up again! */ 

} /* if you get a TAIA while expecting a TEIF, then no problem, simply get out of the suspicion 

period */ 

if {suspieiori-period [message.subid] = true) { 

Log Error {EC-ERROR, "Manager", 

"gotuauTAlAuwhileuexpectinguauTElFu=>ugetuOutuofutheususpicionuperiod"); 

suspiciori-period [message.subid] = FALSE; 

} 

else { 
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Lo^iJrror(EC_ERR0R, "Manager", "gotuaudueuTAIAuinutimeu—ui’enewinguTAIAutimeoutu7od. ", 
message.subid); 

(renew TAIA-timeout, subid 25) 

} 

break; 

case TAIA_TIMEDUT; 

LogError {EC_ERR0R, "Manageruloop", 

"TAIA_TIMEDUTumessage:unouheartbeatufromuBackupu7«du—uenteringususpicionuperiod", 

mess age. subid); 

I* no heartbeat from a remote component... enter a suspicion period then. */ 
suspicion-period[message .subid] = TRUE; 

(insert TEIF timeout 26) 

(delete timeout (TAIA-timeout, subid) 27) 

break; 
case TEIF: 

Log Error (EC _ERR0R, "Manageruloop", 

"TEIFufiiessage : uIAT@node7odusentuanualarmuanduwentutOusleep. ", message.subid); 

/* a TEIF message has been sent from a lAT: as its last action, the lAT went to sleep */ 
if {suspieion_period[message.subid] = TRUE) { /* <delete timeout (TAIA-timeout, subid)> */ 

(delete timeout (TEIF-timeout, .subid) 28) 

suspicion-period[message.subid] = FALSE; /* agent recovery will spawn a clone of the backup 
subid. If no backup clones are available, the entire node will be rebooted. */ 

(Agent-Recovery(sM^id) 29) 

} 

else { 

if {message.subid = GetRoot{)) { 

(clear I A-flag 16) 

(activate I AT 17) 

} 

else { 

int to = message.subid; 

message.type = ENIA; /* i.e., “ENable lAt” */ 

message, subid = GetRoot{); 

RemoteSendMessage{to,kLlkS{to),{ch.a.r *) Szmessage, sizeoi (message)); 

} 

} 

break; 

case TEIF_TIMEOUT: 

LogError{EC_ERR0R, "Manageruloop", "TEIF_TIMEOUTumessageu—uTheuManageruConX 
cludesuthatutheususpectedunodeu(7.d)yhasucrashed. ", message.subid); 
if (suspiciori-period [message.subid] = true) { 

(delete timeout (TAIA-timeout, subid) 27) 

suspicion-period[message.subid] = FALSE; /* the entire node will be rebooted. */ 
(Node-Recovery(sM6zd) 31) 

} 

break; 
case ENIA: 

LogError(EC_ERROR, "Manageruloop", "ENIAuinessage . "); 

(clear lA-flag 16) 

/* if an lAT gets an “activate” message while it’s active, that message is ignored */ 

{activate I AT 17) 

break; 
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case WITM: 

LogUrror (EC_ERR0R, "Manageruloop", "WITMumessage . "); 
sender = message, subid; 

message, subid = message.arg[0] = GetRoot{); 
message .type = NMI; 

RemoteSendMessage {sender, kLlkS{sender), {char *) & message, sizeof {message)); 

break; 
case NIUA: 

LogError{EC_ER'ROR, "Backup", 

"NIUAumessageu—unodeu7.d.uisuupuagainu—uwatchingurestarts. . . message .subid); 

/* a Node Is Up Again: restart watchin’ it */ 
if {-•tom-ispresent{tom, taia + message .subid)) tom-insert {tom, taia + message.subid); 

break; 

case REQUEST_DB: 

LogError{EC_ERROR, "Backup", "REQUEST_DBumessage. "); 

/* a node is requesting a full copy of the database */ 

RemoteSendMessage{message.subid,kElkE{message.subid), {char *) &(i6,sizeof {db)); 
break; 
default: 

LogError{EC_ERROR, "Backup", "Otherumessages . "); 

{deal with these other messages 32) 

} /* end switch (message.type) */ 

{clear lA-flag 16 ) 

(renew lA-flag-timeout 43) 

} /* end manager loop * / 

This code is used in section 9. 

19. A “Manager Is Alive” message needs to be sent to component message.subid: 

(send MIA to backup subid 19 ) = 

message .type = MIA; /* Note: MIA.arg[0] == managerid!! */ 

message. arg[0] = GetRoot{); 

RemoteSendMessage{message.subid, kElkE{message.subid), {char *) femessoge, sizeof {message)); 

This code is used in section 18. 

20. A broadcast is about to take place—this implies that a suite of implicit MIA’s will be sent in piggy¬ 
backing. As a consequence, all MIA_SEND_TIMEOUT’s needs to be renewed. 

(renew MIA timeout on all subid’s 20) = 

{ 

int i; 

for (z = 0; i < GetRoot{); z-H-) { 
tom-renew{tom, mia + i); 

} 

for (z-ES; z < MAX_PRDCS; z-El) { 
tom-renew {tom, mia -I- z); 

} 

} 

This code is used in section 18. 
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21. A message modified the database, and that message was local, i.e., generated on this node. To keep 
all instances of the database up to date, we need to propagate this message to all the other components: 

(broadcast modifications to db 21 ) = 

{ 

int i; 

for (i = 0; i < GetRoot{); i++) { 

RemoteSendMessage{i, ALlkS{i), {char *) Szmessage, sizeof (message)); 

} 

for (i++; i < MAX_PRDCS; i++) { 

RemoteSendMessage{i, ALlkS{i), (char *) Szmessage, sizeoi (message)); 

} 

} 

This code is used in sections 18 and 37. 

22. The just arrived message is of type DB, i.e., it concerns the database. Our copy of the database needs 
then to be updated. 

(update your copy of the db 22) = 
switch (message.arg^]) { 

case DB_NEW_STATUS: dh.node[message.subid].status = message.arg[V\; 

break; 

case DB_NEW_R0LE: db.node[message.subid].role = message.arg[l\; 

break; 

case DB_INC_REB00T: db.node[message.subid].reboot_nr++; 

break; 

case DB_NEW_TASK_STATUS: db .node[message .subid].task[message .arg[l]].status = message, arg [2]; 

break; 

case DB_NEW_TASK_ERROR: db .node[message .subid].task[message .arg[l]].status = message, arg [2]; 

break; 

} 

This code is used in sections 18 and 37. 

23. A TAIA_SEND_TIMEOUT needs to be inserted again in the timeout list. 

(insert TAIA-timeout, snhid=subid 23) = 
tom-insert (tom, taia + message.subid); 

This code is used in section 18. 


24. A node is up again. Broadcast the news (of course, skipping yourself and the node back to life...) 
(broadcast NIUA 24) = 

{ 

int i, n, this; /* note: message.subid is the id of the Node which Is Up Again */ 
for (message.type = NIUA,n = MAX_PRDCS,i = 0,this = GetRoot(); i < n; i++) 
if (i ^ this A i 7 ^ message.subid) 

RemoteSendMessage(i, kLlkS(i), (char *) Szmessage, sizeof (message)); 

} 

This code is used in section 18. 

25. A node is up again, so I need to start again watching its component. 

(renew TAIA-timeout, subid 25 ) = 

tom-renew (tom, taia + message.subid); 

This code is used in section 18. 
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26. We just entered a suspicion period—we need to discriminate the case ‘on agent is down' from the case 
‘a complete node is down'. To do so, we start waiting for at most IMALIVE_SET_TIMEOUT for some signs of 
life coming from the I’m Alive Task on node message.subid. This is managed bu simply inserting an acyclic 
timeout in the timeout list as follows: 

(insert TEIF timeout 26) = 

tom-declare{kteif ,TE1F_CYCL1C, TOM_SET_ENABLE, TEIF_TIMEOUT, message.subid, 
IMALIVE_SET_TIMEOUT); 
tomJnsert{tom, Szteif); 

This code is used in sections 18 and 37. 

27. If we are suspecting an agent, there’s no need to expect something from it (dosn’t it sound like 
philosophy? ;-) so we need to suspend the cyclic TAIA timeout; we do that deleting it temporarily from the 
list. 

(delete timeout (TAIA-timeout, subid) 27) = 
tom_delete (tom, taia + message.subid); 

This code is used in section 18. 

28. Luckily, only the remote component is crashed, not the entire node where it was running onto. This 
seems to be true because a TEIF message has been sent from the I’m Alive Task of that node. The TEIF 
timeout is consequently deleted. 

(delete timeout (TEIF-timeout, subid) 28) = 
tom-delete (tom, Szteif); 

This code is used in section 18. 

29. Agent recovery will spawn a clone of the backup subid. If no backup clones are available, the entire 
node will be rebooted. Now the problem is—who can do that? The only component being alive on that 
node is... the I’m Alive Task, so the only way to accomplish this task should be by sending a “spawn new 
component” (SPAN) message to that I’m Alive Task: 

(Agent-Recovery(sM6td) 29) = 
message.type = SPAN; 

RemoteSendMessage{message.subid, lkT_kLlkS, {ch.a.r *) Szmessage, sizeot (message)); 

This code is used in section 18. 

30. Same as above, but for the manager. 

(Manager-Recovery(?7iaua3en(i) 30 ) = 

message.type = SPAN; 

RemoteSendMessage(managerid,lkT_kLlkS, (char *) Szmessage, sizeot (message)); 

This code is used in section 37. 


31. This section covers the case no sign of life seems to come from a suspected node. The node is therefore 
rebooted. Open problem: if TEX has a local scope, who can do this? Probably this will require TEX to 
send a message to its host via a (burst of) UDP writes; then the host will take care of rebooting the node 
in question or to do something else, e.g., triggering an alarm for the operator. Anyway, for us this is simply 
a TEXRehoot(). 

(Node-Recovery(sM6zd) 31) = /* TEXReboot(message.subid); TEXReset(message.subid); */ 

This code is used in sections 18 and 37. 
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32. Other messages are foreseen—their management will be put in here. Note: for the moment, this 
section is shared amongst manager and backups. 

(deal with these other messages 32) = 

{ 

switch (message.type) { j* ...to be added... */ 

} /* end switch */ 

} 

This code is used in sections 18 and 37. 

33. A backup agent can be considered as a manager of a system collapsing to its node. It only takes local- 
scope decisions and actions. Nevertheless, quite a lot of its code is inherited almost without modification 
from the Manager, and its structure is basically the same of this latter. 

(DIR net backup agent 33) = 

{ 

LogArror(EC_ERRDR, "Backup", "Backupustarts . . ."); /* the alarm of these timeouts simply 

sends a message of id [timeout-type^, subid = subid to the backup agent */ 

(insert four timeouts (lA-flag, MIA, TAIA, and TEIF) 34) 

/* TimeWaitHigh(TimeNowHigh() -1- 300000); */ /* suspicion period is set to off */ 

( clear suspiciori-period 15) 

(clear lA-flag 16) 

Lo^Arror(EC_ERRDR,"Backup", "activatinguIAT..."); 

LogError{EC_ERROR, "Backup", "BackupuactivatesuIAT... 

(activate lAT 17) 

LogError{EC_ERR0R, "Backup", "BackupuloopuStarts... 

(backup loop (waiting for incoming messages) 37) 

} /* end backup */ 

This code is used in section 6. 

34. The backup initialises a set of timeout objects and inserts them into the timeout list, more or less the 
way the manager does; only, there’s just one MIA timeout cornin’ in and one TAIA coming out. 

(insert four timeouts (lA-flag, MIA, TAIA, and TEIF) 34) = 

{ 

tom = tom_init(sendJ,imeout-message)] 

(declare and insert the one MIA timeout 35 ) 

(declare and insert the lA-flag timeout 12 ) 

(declare and insert the one TAIA timeout 36) 

} 

This code is used in section 33. 

35. At most every MIA_DEADLINE_B ticks a “Manager Is Alive” (MIA) message needs to be received by a 
backup. Note how, regardless the actual value of managerid , the entry being filled in is always entry 0. 

(declare and insert the one MIA timeout 35) = 

{ 

LogError (EC _ERRQR, "backup", "manageridu==u°/«d.", managerid)] 

tom-declare{mia -I- manaf/erid, MIA_CYCLIC_B, T0M_SET_ENABLE, MIA_TIMEDUT_B, managerid, 
MIA_DEADLINE_B); 
tom-insert [tom, mia + managerid)] 
tom-dump (tom)] 

} 

This code is used in section 34. 
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36. At most every TAIA_DEADLINE_B ticks a “This Agent Is Alive” (TAIA) message needs to be sent from 
this backup to the Manager. Note how, regardless the actual value of managerid , the entry being filled in is 
always entry 0. 

(declare and insert the one TAIA timeout 36) = 

{ 

tom_declare(taia, Tklk_CYCLlC_B, TOM_SET_ENABLE, TAIA_TIMEDUT_B, 0, TAIA_DEADLINE_B); 
tomJnsert{tom, taia); 

} 

This code is used in section 34. 
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37. This loop is the real core of the backup agent, the way the manager loop was for the manager. It has 
do deal with a number of messages coming from the timeout manager, the manager, remote I’m Alive Tasks. 
As we said in the corresponding section of the manager, this is the core of the fault-tolerant strategy of the 
DIR net. 

(backup loop (waiting for incoming messages) 37) = 
while (1) { 

(wait for an incoming message 53) 

LogError (EC_ERR0R, "Backup", "messageureceivedu7«d.u [typeu==u7«s] uf romunodeu7.d", 

message.type, DIRPrmtCode{message.type), message.local ? GetRoot{) : message.subid); 
switch {message.type) { 

case IA_FLAG_TIMEOUT: /* time to clear the lA-flag! */ 

{clear lA-flag 16) 

LogError{EC_ERR0R, "Backupuloop", "IA_FLAG_TIMEOUTu->uIA-flagucleared"); 
break; 

case TAIA_TIMEOUT_B: /* time to send a TAIA to the manager */ 

LogError{EC_ERR0R, "Backupuloop", "TAIA_TIME0UT_Bu->usenduTAIAutOumanager"); 

{send TAIA to the manager 38) 
break; 

case DB: /* a, message which modifies the db */ 

LogError{EC_ERR0R, "Backupuloop", "DBumessage. "); 
if {message.subid = GetRoot{)) /* the message is local */ 

{ 

(update your copy of the db 22) 

(broadcast modihcations to db 21 ) 

/* this is sent also to the manager, therefore it is a TAIA in piggybacking. */ 

(renew TAIA timeout 39) 

} 

else { 

(update your copy of the db 22) 

if {message.subid = managerid) tom-renew{tom, mia + managerid)] 

} 

break; 
case MIA; 

LogError {EC _ERRER, "Backupuloop", "MIAumessage:uinajiagerid==7od,uarg[0]==7od,u7oS. ", 
managerid, message.arg[0], {managerid = message.arg[0]) 1 "equal" : "different"); 

/* if a MIA comes in... */ 
if {-^tom-ispresent{tom, mia + message.arg[Q])) { 

LogError (EC _ERRQR, "Backupuloop", "MIAutimeoutu7.duisunotupresent! ", message .arg[Q])', 
tom-dump {tom); /* a new manager has been chosen */ 

tom-delete {tom, mia + managerid); 

LogError{EC_ERRQR, "Backupuloop", "MIAutimeoutu7.dudeleted! ", managerid); 
managerid = message. arg[0]; 

LogError {EC-ERROR, "Backupuloop", "Newunianageruisu7«d", managerid); 

} 

tom-renew{tom, mia + managerid); /* if you get a MIA while expecting a TEIF, then no 
problem, simply get out of the suspicion period * / 
if {*suspicion-period = TRUE) { 

* suspicion-period = FALSE; 

LogError {EC-ERROR, "Backup", 

"gotuauMIAuwhileuOxpectinguauTEIFu—ugotuoutuofutheususpicionuperiod!"); 

} 

break; 
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case MIA_TIMEDUT_B: 

LogiJrror (EC_ERR0R, "Backupuloop", "MIA_TIMEOUT_Bumessage ; unoyheartbeatuf ro\ 
mutheuinanageru—ususpicionuperioduentered"); 

/* no heartbeat from the manager... enter a suspicion period then 
*suspicion_period — TRUE; 
message.subid = managerid; 

LogError(EC_ERR0R, "Backupuloop", "AboututOuinsertuauTEIF"); 

(insert TEIF timeout 26) 

LogError(EC_ERROR, "Backupuloop", "TEIFuinserted"); 
tom-delete {tom, mia + managerid)] 

break; 

case TEIF: Lo^iJrror(EC_ERR0R, "Backupuloop", 

"TEIFumessageu—ureceiveduaumessageufromuIAT(§node7od", message .subid)] 

/* a TEIF message has been sent from a lAT: as its last action, the lAT went to sleep */ 
if {*suspiciou-period = TRUE A message.subid = managerid) { 

/* delete timeout (MIA-timeout, subid)] tom_delete(tom, mia + managerid); */ 
tom-delete {tom, hteif)] 

* suspicion-period = FALSE; 

Log Error{EC-ERROR, "Backupuloop", "Manageruneedsutoubeurecovered!"); 

/* Manager recovery will spawn a clone of the backup .subid. If no more manager clones are 
available, the entire node will be rebooted. */ 

(Manager-Recovery(?nana( 7 end) 30) 

} 

else { 

LogError {EC_ERR0R, "Backupuloop", 

"IAT@node7oduneedSutOubeuawakenu—usentuENIAutOuComponentOnodeyod" , message.subid, 
message.subid)] /* send ENIA (i.e., “ENable lAt”) to subid */ 
message.type = ENIA; 

RemoteSendMessage{message.subid, kLlkS{message.subid), (char *) Szmessage, sizeot 
{message))] 

tom-renew{tom, mia + managerid)] 

} 

break; 

case TEIF_TIMEDUT_B: LogError{EC-ERROR, "Backupuloop", "TEIF_TIMEOUT_Bumessage."); 
if {*suspicion-period = TRUE) { 

LogError{EC_ERR0R, "Backupuloop", "theunodeuisuSuspectedu=>uhasucrashed."); 

(delete MIA-timeout coming from subid, if any exists 40) 

* suspicion-period = FALSE; /* the entire node will be rebooted. */ 

LogError{EC_ERR0R, "Backupuloop","nodeurecovery!"); 

(Node-Recovery(sM6zd) 31) 

LogError {EC_ERR0R, "Backupuloop", " 'aunodeuisudown’uHiessageusentuaround"); 

(send ANID to all except managerid 41) 

LogError{EC-ERROR, "Backupuloop", "choiceuofunewumanager"); 

(choose next manager 42) /* if this backup is to be the new manager... */ 

if {managerid = GetRoot{)) { 
mystate .role = DIR_MANAGER; 

TEXSetState {hmystate ); 

TEXRestartTask {); 

} 

} 

else { 
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Lo^iJrror (EC_ERR0R, "Backupuloop", 

"lATyof utheumanageruneedsutoubeuawakeiiu—usentuENIAutOuMctnager@node7od", 
message.subid); /* send ENIA (i.e., “ENable lAt”) to jmanageridi */ 

message.type = ENIA; 

RemoteSendMessage{managerid,kLlkS{managerid),{ch.a.r *) ^message, sizeot (message)); 

/* renew MIA-timeout */ 
tom-renew(tom, mia + managerid); 

} 

break; 
case WITH: 

Lo^Error(EC_ERR0R, "Backupuloop", "WITMumessage."); 

sender = message .subid; 

message. arg[0] = managerid; 

message, subid = GetRoot(); 

message.type = NMI; 

RemoteSendMessage(sender, kLlkS(sender), (ch.a.r *) Szmessage, sizeot (message)); 

break; 

case ENIA: LogError (EC_ERR0R, "Backupuloop", "ENIAuinessage . "); 

(clear lA-flag 16) 

/* if an lAT gets an “activate” message while it’s active, the message is ignored */ 

(activate I AT 17) 

break; 

case NMI: Lo^Error (EC_ERRDR, "Backupuloop", "NMIumessage . "); 
if (message.arg [0] = managerid) break; 

/* something is going wrong - someone has a different managerid */ 

LogError(EC_ERR0R, "Backupuloop", 

"nodeuy.duthinksutheumanageruisuy.d.uwhileuluthinkuituisuy.d. ", message.subid, 
message. arg [0], managerid ); 
break; 

case REQUEST_DB: 

LogError(EC_ERROR, "Backupuloop", "REQUEST_DBumessage."); 

/* a node is requesting a full copy of the database */ 

RemoteSendMessage(message.subid, kLlkS(message.subid), (char =i<) &d6,sizeof (db)); 
break; 

default: Log Error (EC _ERR0R, "Backupuloop", "otherumessages . "); 

(deal with these other messages 32) 

} /* end switch (message.type) */ 

(clear lA-flag 16) 

(renew lA-flag-timeout 43) 

} /* end backup loop */ 

This code is used in section 33. 

38. A “This Agent Is Alive” message needs to be sent to the manager. 

(send TAIA to the manager 38) = 
message.type = TAIA; 
message, subid = GetRoot(); 

Log Error(EC_ERRCR, "Backup", "sendinguTAIAutOumanager..."); 

RemoteSendMessage (managerid, kLlkS(managerid), (char *) Szmessage, sizeof (message)); 
LogError(EC_ERROR, "Backup", " . . .TAIAusentutOuinanager. "); 

This code is used in section 37. 
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39. The local database has been modified—this calls for a broadcast of the modifications. One of those 
who will receive these modifications is the manager. Such information holds implicitly (in piggybacking) a 
TAIA, so we can renew that timeout. 

(renew TAIA timeout 39) = 
tom_renew (tom, taia); 

This code is used in section 37. 

40. A TEIF_TIMEOUT_B has come from node message.subid. Here we build a timeout object and try to 
delete an entry pertaining node message.subid and holding a MIA_TIMEOUT_B. 

(delete MIA-timeout coming from subid, if any exists 40) = 

{ 

timeout_t t; 

tom.declare {M, MIA_CYCLIC_B, T0M_SET_ENABLE, MIA_TIME0UT_B, message.subid , MIA_DEADLINE_B); 
tom-delete {tom, &t); 

} 

This code is used in section 37. 

41. “A Node Is Down” (ANID) is sent to everyone but the down node. 

(send ANID to all except managerid 41) = 

{ 

int i, n, this; 

for {i = 0, message .type = ANID,n = MAX_PRDCS, this = GetRoot{); i < n; i++) 
if {i ^ managerid l\i ^ this) 

RemoteSendMessage{i, PLLlkS{i), {ch.a.r *) Szmessage, sizeot (message)); 

} 

This code is used in section 37. 

42. This is the naivest strategy for choosing a new manager. Some more sophisticated (and safer) protocol 
will be used in the future. 

(choose next manager 42) = 
managerid++; 

if (managerid > MAX_PRDCS) managerid = 0; 

This code is used in section 37. 

43. Renew the lA-flag timeout. 

( renew lA-flag-timeout 43) = 
tom-renew (tom, &za); 

This code is used in sections 18 and 37. 

44. The number of tasks on this node is stored in the database. 

(store number of tasks (n); 44) = 

{ 

db .node[GetRoot()].task-nr = n; 

} 

This code is used in section 8. 
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45. Information pertaining task i is stored in the database. 

(store in local database(i, status) 45) = 

{ 

int this-node = GetRoot{); /* is thread i running or waiting? Is it isolated / faulty / ok... */ 
db.node[this-node].task [i]. status = status; /* initialize errorjib to zero */ 
db .node[this-node\.task[i].error_nr = 0; 

} 

This code is used in section 8. 

46. This section runs the algorithm used in the Voting Farm to manage the problem of global broadcasts 
(each component has to broadcast some data to all the others and has to deliver data sent by all the others). 
See also “the Algorithm of Pipelined Broadcast”. 

(broadcast local database and receive the others’ databases 46) = 

{ 

int nProcs = MAX_PR0CS; 
int task-ur, i; 

for (z = 0; i < nProcs; z++) { 
if (z = GetRoot{)) { 

{broadcast the local part of the database 48) 

} 

else { 

(deliver remote database 47) 

} 

} 

} 

This code is used in section 8. 



§47 DIRNET 


THE DIR NET 23 


47. A remote database is delivered in three steps: first, the sender sends an integer with its node-id; 
second, the number of tasks to be transfered is sent; and third, the task information is received into the 
proper ‘slot’. The first two integers are received in the node’s “normal” mailbox, the bulk of data is received 
via the DB_MBDX mailbox. Errors are also sent, if any. 

(deliver remote database 47) = 

{ 

int n; 

n = sizeof (int); 

TEXReceiveMessage {MBOX{GetRoot{)), (char *) ^sender ,Szn, INFINITE); 
jflush{stdout); 
n = sizeof (int); 

TEXReceiveMessage (MBOX{GetRoot{)), (char *) htask-nr INFINITE); 
if {task_nr) { 

n = task-nr * sizeof {DIR_task-t); 

TEXReceiveMessage {DB_KB0X, (char *) db.node[sender].task, <kn, INFINITE); 

} 

db .node[sender].task_nr = task-nr; 

n = sizeof (int); 

TEXReceiveMessage {KBOX{GetRoot{)), (char *) hdb .node[sender].error_nr, &n, INFINITE); 
if {db.node[sender].error-nr) { 

n = db.node[sender].error_nr 4 sizeof (DIRXask-t); 

TEXReceiveMessage (DB_MB0X, (char =i=) db.node[sender] . error , &n, INFINITE ) ; 

} 

} 

This code is used in section 46. 

48, This is symmetrical to the previous section. The order of broadcast is optimal with respect to maxi- 
mizing the throughput of a fully connected (crossbar) system (see “the Algorithm of Pipelined Broadcast). 
The first two integers are sent to the destination node’s “normal” mailbox, the bulk of data is sent to the 
destination node’s DB_MB0X mailbox. Error informationation is also sent the same way. 

(broadcast the local part of the database 48) = 

{ int i, nprocs, me; 
me = GetRoot {); 

nprocs = MAX_PR0CS; for (i = 0; i < nprocs; i-i-+) { 

if {i = me) continue; 

RemoteSendMessage{i, ALlAS{i), {char *) feme, sizeof (int)); 

RemoteSendMessage{i, ALlAS{i), (char *) Szdb.node[me].task_nr,sizeoi(int)); 
if {db.node[me].task-nr) { 

RemoteSendMessage{i,DB_nB0X, {char *) db.node[me].task, db.node[me].task-nr 4 sizeof 
{DIR_task-t)); 

} 

RemoteSendMessage{i, ALlAS{i), {char *) &(i6.no(ie [me]. error_nr, sizeof (int)); if {db.node[me].error_nr) 
{ RemoteSendMessage (i,DB_MB0X, (char *) db.node[me] . error , ((i6.no(ie[me].error_nr) 4 (sizeof 
{DIR_error_t)) ) ; } } } 

This code is used in section 46. 
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49. Here we fill in with default values the “dynamic” (modifiable) part of the database, i.e., errors and the 
like. 

(build a global database 49) = 

{ 

int nProcs = MAX_PR0CS; 

for (i = 0; i < nProcs; i++) { 
db .node[i].error-nr = 0; 
db .node[i].update-nr = 0; 
db .node[i].reboot_nr = 0; 

} 

} 

This code is used in section 8. 

50. The whole database needs to survive to reboots. This is accomplished by means of the DataReset{) 
function, which marks data as ‘reboot-resistant’. 

(mark as ‘reboot-resistant’ the whole database via DataReset{) 50) = 

{ /* TXT has to supply this information */ 

} 

This code is used in section 8. 


51. Sends a REQUEST_DB message to one or more of the neighbouring dirnet components, and waits until it 
gets a full copy of the global db. Note: it uses a special mailbox for that, because the size of those messages 
is extremely larger with respect to the size of messages for ‘normal’ mailboxes. Each node, say node n, sends 
requests first to node n -|- 1 modMAX_PR0CS, on a circular basis. 

(request a copy of the global database 5i) = 

{ _ 

int this-node = GetRoot{); 
int nProcs = MAX_PR0CS; 
int i; 

int size; 
int retval; 

message.type = REQUEST_DB; 

message.subid = this-node; 

for (i = this-node -I- 1; f < nProcs; z-H-) { 

RemoteSendMessage{i, kLlkS{i), (char *) Szmessage, sizeof (message)); 
size = sizeof (db); 

retval = TEXReceiveMessage{DB_mOX, (char *) &(i&, &szze,REPLY_DB_TIMEDUT); 
if (retval = MSG_0K) break; 

} 

if (retval ^ MSG_DK) 

for (z = 0; z < this-node; z-H-) { 

RemoteSendMessage(i, k'LlkB(i), (char *) feznessa^e, sizeof (message)); 
size = sizeof (db); 

retval = TEXReceiveMessage(m_mOX, (char =i=) &d6, &szze, REPLY_DB_TIME0UT); 

} 

} 

This code is used in section 8. 
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52. This is the function which is called when a timeout occurs. In a sense, it translates a timeout event 
into a message event which is sent to the local DIR net component. 

(Alarm function 52) = 

int send-timeout-message {TO'M. *tom) 

{ 

message.type = tom^top-timeout.id; 
message, subid = tom^top^timeout.subid; 
message .local = I; 

return TEXSendMessage{KBOX{GetRoot{)), {char *) Szmessage, sizeoi (message)); 

/* no test on return value at the moment */ 

} 

This code is used in section 4. 

53. Note: at the moment, this is shared among backup and manager! The message should be received into 
a global variable called “message”, holding fields like “type”, “subid”, “arg[0..argc]”.. .Open question: is it 
possible to specify a loop like the following one, or is it better to have even a small timeout? 

{wait for an incoming message 53) = 

{ 

int size = sizeof (message.t); 

Log Error (EC _KE.SS, "<waituf oruanuiiicomingumessage>", "waiting. . . "); 

TEXReceiveMessage {KBOX(GetRoot ( )), (char *) &:message, ksize, INFINITE); 

} 

This code is used in sections 18 and 37. 

54. Broadcast a ‘Who Is The Manager’ (WITH) message. 

(send WITH to all 54) = 

{ 

int i; 

message.type = WITH; 

for (z = 0; i < GetRoot(); i++) { 

RemoteSendMessage{i, kLlkS(i), (char *) Szmessage, sizeoi (message)); 

} 

for (i++; i < MAX_PRDCS; i++) { 

RemoteSendMessage(i, kLlkS(i), (char *) Szmessage, sizeoi (message)); 

} 

} 

This code is used in section 8. 
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55. If a node has rebooted or a new ‘generic component’ took the place of another one, then the running 
component needs to clarify its own role in the dirnet. This is done by sending a WITH around. The following 
step waits until some NMI (‘New Manager Is... ’) messages come in. 

(wait for NMI messages to come 55) = 

{ _ 

int z, n, size] 

for (z = 0, n = MAX_PRDCS, szze = sizeof (message_t); 1; /* forever */ 

z = ++Z % n) 

if (Ti?dfi?ecezz;eMessa(ie(MBOX(z), (char *) S£message,Szsize,0) = MSG_DK) 
if {message.type = NMI) break; 
else { 

LogError{E.C_ERKQK, "Backup", 

"NMIuHiessageuexpected, u7odumessageureceiveduf romunodeu7.d", message .type , z); 

} 

} 

This code is used in section 8. 

56. These two functions resp. get and set the overall state of this generic component. 

( GetState and SetState 56) = 

TEXGetState {sta.tus_t *s) 

{ 

memcpy {s, Szdb.status, sizeof {status_t)); /* a dirty trick for the time being... */ 

s-^primary = I; 

} 

TEXSetState{statusX *s) 

{ 

memcpy {kdb .status, s, sizeof (status_t)); 

} 

This code is used in section 4. 

57. If we are at true initialisation time, then the roles are read from the RL script. 

(Read your role from the RL script 57) = 

{ 

int z; 

for (z = 0; z < RC0DE_CARD; z++) { 

if (rcodes [z][0] = R_SET_R0LE A rcodes[z][I] = GetRoot{)) { 
switch (rcodes[z][2]) { 
case R_AGENT: db.role = DIR_AGENT; 
break; 

case R_BACKUP: db.role = DIR_BACKUP; 
break; 

case R_MANAGER: db.role = DIR_MANAGER; 
break; 

} 

break; 

} 

} 

} 

This code is used in section 8. 
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58. If we are at true initialisation time, then the identity of the manager is read from the RL script. 
(Check who is the manager according to the RL script 58) = 

{ 

int i; 

for (i = 0; i< RC0DE_CARD; i++) { 

if {rcodes[i][0] = R_SET_R0LE A rcodes[i][2] = R_MANAGER) { 
managerid = rcodes WW; 

break; 

} 

} 

LogError{EC_ERR0R, "init", "theuinanageruisuonunodeu7od. ", managerid); 

} 

This code is used in section 8. 
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59. 

(TEX routines simulated on EPX 59) = 

#ifndef TEX 

RemoteSendMessage {int node, int alias, char ^message, int size) 

{ _ 

int retval] 

#ifdef EPX_MAILBOXES 

Lo^Error (EC_DEBUG, "sender", "aboututOusenduau7«dubyteunisgutOunodeu7«d,ureq-idu7.d", size, 
node, alias); 

retval = PutMessage{node, o/zos, MSG_TYPE_USER_START, 1,-1, message, size); 
if {retval < 0) { 
char st[80]; 

LogError{EC_ERROR, "RemoteSendMessage", "PutMessageuerror : "); 
sprintf{st, "PutMessageuonunodeu7od:u", GetRoot))); 
perror {st); 

} 

^else 

retval = SendN ode {node, alias, message, size); 
if {retval < 0) { 

printe{"7td: uRemoteSendMessage : \n", GetRoot {)); 
perror ("ErrorusendinguwithuSendNode"); 

} 

else if {retval > 0) { 

printe{"7td: uRemoteSendMessage : \n", GetRoot {)); 

printe{''7td: usendumessageusizeuuisuugi’eateruuthanuutheureceiveubuf f er\n", GetRoot{)); 

} 

^endii 

} 

TEXSendMessage {int mbox, char ^message, int size) 

{ _ 

int retval; 

#ifdef EPX_MAILB0XES 

LogError{EC_DEB11G, "sender", "aboututOusenduau7odubyteumsgutOunodeu7od,ureq-idu7od", size, 
GET_R00T( )^ProcRoot^MyProcID, mbox); 

retval = PutMess age {GET _RCCT{ )-ProcRoot-MyProcID, m6oa;,MSG_TYPE_USER_START, 1, —1, message, 
size ); 

if {retval < 0) { 
char st[80]; 

LogError{EC_ERR0R, "RemoteSendMessage", "PutMessageuerror:"); 
sprintf{st, "PutMessageuonunodeu7od:u", GetRoot{)); 
perror {st); 

} 

^else 

retval = SendNode {GET_R00Tl{ )-’ProcRoot^MyProcID, mbox, message, size); 
if {retval < 0) { 

printe{''7td: uRemoteSendMessage : \n", GetRoot {)); 
perror ("ErrorusendinguwithuSendNode"); 

} 

else if {retval > 0) { 

printe{"7<,d: uRemoteSendMessage : \n", GetRoot {)); 

printe{"7A: usendumessageusizeuuisuugreateruuthanuutheureceiveubuffer\n", GetRoot{)); 
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} 

^j^endif 

} 

TEXReceiveMessage {int mbox,char ^message, int *size, int timeout) 

{ _ 

int retval; 

#ifdef EPX_MAILBOXES 

RR_Message-t * m; 
m = malloc {sizeof {RR_Message-t)); 

^endif 

if (flag) Log Error (EC _ERROR, "RecvMessage", 

"TEXReceiveMessageuaboututOureceiveuaumsgu(size=7,d)ufromuanyunodGuonureq-iduZd", 
*size, mbox); 

#ifdef EPX_MAILBOXES 

retval = GeiMesso^e (—1, ?7i6oa;, MSG_TYPE_USER_START, —1, m); 

*size = Header .Size; 
memcpy {message,m^Body, *size); 
if {retval < 0) { 
char si [80]; 

Log Error {EC _ERR0R, "RemoteSendMessage", "GetMessageuerror : "); 

sprintf{st, "GetMessageuonunodeu°/od:u", GetRoot{)); 

perror{st); 

} 

:;^else 

*size = RecvNode{—l, mbox, message, *size); 
if {flag) Log Error {EC _ERR0R, "RecvMessage", 

"TEXReceiveMessageureceiveduaunisguonureq-idu7od", mbox); 
if {*size < 0) prmie("7od:uTEXReceiveMessageuerror :ureceivedumessageuisugreateruth\ 
anyreceiveubuff er\n", GetRoot {)); 

^j^endif 

} 

GetRoot () 

{ 

return GET_R00T( )-^ProeRooHMyProcID; 

} 

Export {AXias.t o, IDF b) 

{} 

int TEXGetNumTasks{) 

{ 

extern int numXasks [ ]; 
return num-tasks[GetRoot{)]; 

} /* this function should return a value meaning that the task is running (DIR_RUNNING) or waiting 

for being enabled (DIR_WAITING). For the time being, as we don’t have access to this information, 
we only return DIR_RUNNING */ 

STATUS TEXGetTaskStatus {IDF t) 

{ 

return DIR_RUNNING; 

} 

TEXStopTask{) 

{ 
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} 

TEXRestartTask () 

{ 

DIRNetGenericComponent {); 

TEXStopTaski); 

} 

^endii 

This code is used in section 4. 

60, This function converts a message-id (as defined in dirdefs.h) into a human-intelligible message. 

(DIR Print Message 60) = 

char *DIRPrintMessage{int i) 

{ 

if [i > FIRST_MESSAGE A i < LAST_MESSAGE) return DIRMessage[i - FIRST_MESSAGE]; 
return "<unknown>"; 

} 

char *DIRPrintTimeout {int i) 

{ 

switch (i) { 

case IA_FLAG_TIMEDUT: return "lAuflagutimeout"; 

case MIA_TIMEDUT: return "MIAutimeout"; 

case TAIA_TIMEDUT; return "TAIAutimeout"; 

case TEIF_TIMEDUT; return "TEIFutimeout"; 

case IA_FLAG_TIMEOUT_B: return "lAuflagu'B’utimeout"; 

case MIA_TIME0UT_B; return "MIAu'B’utimeout"; 

case TAIA_TIME0UT_B: return "TAIAu'B’utimeout"; 

case TEIF_TIME0UT_B: return "TEIFu'B’utimeout"; 

case IAT_TIME0UT: return "lAuTaskutimeout"; 

case INJECT_FAULT_TIMEOUT: return "F.uinjectingutimeout"; 

default: return "<unknown>"; 

} 

} 

char *DIRPrintCode (int i) 

{ 

char ^s; 

s = DIRPrintTimeout (i)] 

if (strcmp(s, "<unknown>") = 0) return DIRPrintMessage{i); 

return s; 

} 

char *role2ascii {int role) 

{ 

switch {role) { 

case DIR_MANAGER: return "manager"; 
case DIR_BACKUP: return "backupuagent"; 
default: return "<unknown>"; 

} 

} 

This code is used in section 4. 
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61. The I’m Alive Task. After initialisation time, the lAT enters a waiting status in which it waits for an 
activation message from the agent it guards... 

(Fm Alive Task 61) = 
message_t amessage; 
timeout.t aia; 

TOM *atom; (The lAlarm function 63) 

(The DIR Alive function 62) 

This code is used in section 4. 

62. This is the code for the I’m Alive Task. 

( The DIRAlive function 62) = 
int DIR Alive (void) 

{ _ 

int subid; 

int I Alarm {TO'M. *); 
atom = tom-init{IAlarm)] 

restart: Lo(iArror(EC_ERR0R, "lAT", "lATuStarts. . . "); 

(wait for activation message 64) 

LogError(EC_ERR0R, "lAT", "lATuactivatedu-.."); 
subid = amessage.subid; 

/* the alarm of these timeouts simply sends a message of id jtimeout-type^, subid = jsubid^ to the 
lAT */ 

(insert timeout (lA-flag-timeout, CYCLIC, subid) 65) 

LogError(EC_ERR0R, "lAT", "timeoutuactivated.unowuenterutheumairiuloop . . ."); 
while (1) { 

LogError (EC _ERR0R, "lAT", "waitinguf oruaumessage"); 

(lATask: wait for an incoming message 66) 

LogError (EC _ERR0R, "lAT", "auinessageucaineuinu:u7odu(7os) ", amessage .type, 

DIRPrint Code) amessage. type )); 

if (amessage.type = lATJTlMEOlJT) { /* time to check the lA-flag! */ 

if (lA.flag = 0) { 

LogError(EC_ERRER^ "lAT", "lAuflaguisuzero.uCorrect."); 

IA_flag = 1; 

LogError(EC_ERR0R, "IAT", "lAuflaguhasubeenusetuagain."); 

} 

else { 

LogError{EC_ERR0R, "IAT", "AuLuAyRuM! ! ! "); 

(send TEIF to all except subid 67) 

(delete timeout (lA-flag-timeout) 68) 

break; 

} 

} 

} /* end loop */ /* TEXRestartTask(); */ 

goto restart; 

} /* end alive */ 

This code is used in section 61. 
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63. The Alarm function for the I’m Alive Task. 

(The I Alarm function 63) = 
int /Alarm(TOM *atom) 

{ 

amessage .type = IAT_TIMEOUT; 

TEXSendMessage{lA'T_KBOX, {char *) Szamessage, sizeoi (amessage)); 

} 

This code is used in section 61. 

64. The I’m Alive Task here is waiting for an activation message from its generic component. 

(wait for activation message 64) = 

{ 

int size = sizeof (message_t); 

LogError {EC_ERR0R, "<waituf oruactivationumessage>", "waitinguf oruinessages . . . "); 

/* flag=l; */ 

TEXReceiveMessage {1AT_HB0X, (char *) &zamessage,&zsize, INFINITE); 

LogError {EC_ERR0R, "<waituf oruactivationuniessage>", "gotuaumessageu(7.s) ", 

DIRPrintCode ( amessage. type )); 

} 

This code is used in section 62. 

65. Also the I’m alive thread makes use of the timeout manager. 

(insert timeout (lA-flag-timeout, CYCLIC, subid) 65) = 

{ 

int /Alarm (TOM *); 

tom.declare{kaia, IAT_CYCLIC, TOM_SET_ENABLE, IAT_TIMEOUT, subid, IAT_DEADLINE); 
tom-insert ( atom, & aia); 

} 

This code is used in section 62. 

66. Same as <wait for an incoming messagO, but for the I’m Alive Task. 

(lATask: wait for an incoming message 66) = 

{ 

int size = sizeof (message_t); 

TEXReeeiveMessage {l AT-HBOX, (char *) kamessage,ksize, INFINITE); 

LogError (EC-ERROR, "lAT", "gotuniessageu7.du(°/.s)", amessage .type, DIRPrintCode (amessage .type)); 

} 

This code is used in section 62. 



THE DIR NET 33 


§69 DIRNET 

67. 

( send TEIF to all except subid 67) = 

{ 

int i; 

int you = GetRoot{)] 
for (i = 0; i< MAX_PR0CS; i++) { 
if (i ^ subid A i 7 ^ you) { 

amessage.type = TEIF; j* i.e., “ENable lAt” */ 
amessage.subid = GetRoot{); 

RemoteSendMessageU, PlLIPlSU), (ch.a.r *) Szamessage, sizeof (amessage)); 


} 

} 

} 

This code is used in section 62. 

68 . 

(delete timeout (lA-flag-timeout) 68) = 
tom-delete {atom, haia); 

This code is used in section 62. 

69. 

(Spawn the I’m Alive Task 69) = 

{ 

int DIRAlive{^o\d); 

StartThread{DIRAlive, 65536, Szerrors,0, A); 

} 

This code is used in section 6. 

/* eof dirnet.w */ 

__DIR_TYPES_: 5. 

_T0M__H_: 5. 

a: 

aia: 65, 68. 

alarm: 5. 

ALIAS: 4, 8, 18, 19, 21, 24, 37, 38, 41, 48, 

51, 54, 67. 
alias: M. 

Alias_t: 5, 8, 17, 59. 
amessage: 62, 63, 64, 66, 67. 

ANID: 41. 

arg: 5, 8, 18, 19, 22, 37. 
atom: W, 62, 63, 65, 68. 

b: ra. 

block-sp: 5. 
block-stack: 5. 

block_t: 5. 

Body: 59. 
code: 5. 
configuration: 5. 

cyclic: 5. 

DataReset: 50. 


db: 5, 8, 18, 22, 37, 44, 45, 47, 48, 49, 51, 56, 57. 
DB: 18, 22, 37. 

DB_INC_REB0DT: 22. 

DB_MB0X: 4, 8, 47, 48, 51. 

DB_NEW_R0LE: 22. 

DB_NEW_STATUS: 22. 

DB_NEW_TASK_ERROR: 22. 

DB_NEW_TASK_STATUS: 22. 
deadline: 5. 
default-alarm: 5. 

DIR_AGENT: 57. 

DIR_ALIAS_DFFSET: 3, 4. 

DIR_BACKUP: 8, 57, 60. 

DIR-db-t: 5. 

DIR-crror-t: 48. 

DIR_MANAGER: 6, 8, 37, 57, 60. 
DIR_MBDX_DFFSET: 2, 4. 

DIR_RUNNING: 59. 

DIR_state_t: 5. 

DIR-task-t: 47, 48. 

DIR_WAITING: 59. 

DIR Alive: 69. 
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DIRMessage: 60. 

DIRNetGenericComponent: 6, 59. 
DIRPrintCode: 5, 37, 62, 64, 66. 

DIRPrintMessage: 5, 60 . 

DIRPrintTimeout: 5, 60 . 

EC_DEBUG: 59. 

EC_ERR0R: 8, 9, 18, 33, 35, 37, 38, 55, 58, 59, 
62, 64, 66. 

EC_MESS: 8, 53. 

ENIA: 18, 37. 

EPX: 5. 

EPX_MAILB0XES: 59. 

EPX1_2: 5. 

error_nr: 45, 47, 48, 49. 
errors: 5, 69. 
exit: 59. 

Export: 5, 8, 17, M. 

FALSE: 18, 37. 
fftush: 47. 

FIRST_MESSAGE: 60. 
flag: 5, 59. 
fork: 8. 

GET_R00T: 59. 

GetMessage: 59. 

GetRoot: 5, 8, 9, 11, 13, 17, 18, 19, 20, 21, 
24, 37, 38, 41, 44, 45, 46, 47, 48, 51, 52, 
53, 54, 57, M, 67. 

Header: 59. 

i: 8, n, 13, 14, 20, 21, 24, 41, 46, 48, M, M, 
57, 60, 67. 

ia: 7, 12, 43. 

IA_flag: 5, 16, 62. 

IA_FLAG_CYCLIC: 4, 12. 

IA_FLAG_CYCLIC_B: 4. 

IA_FLAG_DEADLINE: 4. 

IA_FLAG_DEADLINE_B: 4. 

IA_FLAG_TIMEOUT: 4, 12, 16, 18, 37, 60. 
IA_FLAG_TIMEOUT_B: 4, 60. 

I Alarm: 62, 

IAT_ALIAS: 4, 8, 29, 30. 

IAT_CYCLIC: 4, 65. 

IAT_DEADLINE: 4, 65. 

IAT_MB0X: 4, 8, 17, 63, 64, 66. 

IAT_TIME0UT: 4, 60, 62, 63, 65. 
id: 5, 52. 

IDF: 5, 8, 17, 59. 

IMALIVE_CLEAR_TIMEOUT: 4, 12. 
IMALIVE_SET_TIMEOUT: 4, 26. 

INFINITE: 5, 47, 53, 64, 66. 

InitSem: 6. 
inject: 7, 11. 

INJECT: 11. 


INJECT_FAULT_DEADLINE: 4, 11. 
INJECT_FAULT_TIMEOUT: 4, 11, 18, 60. 
LAST_MESSAGE: 60. 
link: 5. 

LinkGB_t: 5. 
local: 5, 37, 52. 

LogError: 8, 9, 18, 33, 35, 37, 38, 53, 55, 58, 

59, 62, 64, 66. 
malloc: 59. 

managerid: 7, 8, 9, 30, 35, 36, 37, 38, 41, 42, 58. 
MAX_PRDCS: 4, 7, 11, 13, 14, 20, 21, 24, 41, 42, 
46, 48, 49, 51, 54, 55, 67. 

MAXARG: 5. 
mbox: 

MBDX: 4, 8, 47, 52, 53, 55. 
me: 48. 

memcpy: 56, 59. 

message: 5, 8, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 
29, 30, 32, 37, 38, 40, 41, 51, 52, 53, 54, 55, 
message.!: 5, 53, 55, 61, 64, 66. 

MIA: 11, 18, 19, 20, 34, 35, 37. 
mia: 7, 11, 18, 20, 35, 37. 

MIA.CYCLIC: 4, 11. 

MIA_CYCLIC_B: 4, 35, 40. 

MIA.DEADLINE: 4, 11. 

MIA_DEADLINE_B: 4, 35, 40. 

MIA_RECV_TIMEOUT: 4. 

MIA_SEND_TIMEOUT: 4, 20. 

MIA.TIMEOUT: 4, 11, 18, 60. 

MIA_TIMEDUT_B: 4, 35, 37, 40, 60. 

MSG.DK: 5, 51, 55. 

MSG_TYPE_USER_START: 59. 

MyProcID: 59. 

my state: 7, 8, 37. 
n: 7, 24, 41, 47, 
next: 5. 

NIUA: 18, 24. 

NMI: 18, 37, 55. 

node: 22, 44, 45, 47, 48, 49, M. 

nprocs: 48 . 

nProcs: 17, 46, M. 
numHasks: 
perror: 59. 
primary: 5, 8, 56. 
printe: 59. 

ProcRoot: 59. 

PutMessage: 59. 

R_ AGENT: 57. 

R.BACKUP: 57. 

R.MANAGER: 57, 58. 

R_SET_RDLE: 57, 58. 

RCODE.CARD: 57, 58. 
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rcodes: 57, 58. 
reboot_nr: 22, 49. 

RecvNode: 59. 

RemoteSendMessage: 5, 18, 19, 21, 24, 29, 30, 
37, 38, 41, 48, 51, 54, ra, 67. 
REPLY_DB_TIMEOUT: 51. 

REQUEST_DB: 18, 37, 51. 
restart: 62- 

retval: M, 59 . 

RINT_MB0X: 4. 

role: 5, 6, 7, 8, 22, 37, 57, 

role2ascii: 5, 8, 60 . 

ROUSE: 17. 

RR_Message-t: 59. 
runlevel: 5. 

running: 5. 

s: M, 60. 
sem: 5. 

Semaphore_t: 5. 
send-timeout-message: 5, 10, 34, 
sender: 7, 18, 37, 47. 

SendNode: 59. 

size: M, §§.■ 

Size: 59. 

SPAN: 29, 30. 
sprintf: 59. 
st: ra. 

starting-time: 5. 

StartThread: 69. 
status: 7, 8, 22, 45, 56. 

STATUS: 5, 7, M. 
status.!: 5, 7, 56. 
stdout: 47. 
strcmp: 60. 

subtd: 5, 9, 17, 18, 19, 22, 23, 24, 25, 26, 27, 29, 
33, 37, 38, 40, 51, 52, 62, 65, 67. 
suspended: 5. 

suspicion-period: 7, 14, 15, 18, 37. 
t: 40, M. 

taia: 7, 13, 18, 23, 25, 27, 36, 39. 

TAIA: 13, 18, 27, 34, 36, 38. 

TAIA.CYCLIC: 4, 13. 

TAIA.CYCLIC.B: 4, 36. 

TAIA.DEADLINE: 4, 13. 

TAIA.DEADLINE.B: 4, 36. 

TAIA.RECV.TIMEOUT: 4. 

TAIA.SEND.TIMEOUT: 4, 23. 

TAIA.TIMEOUT: 4, 13, 18, 60. 

TAIA.TIMEOUT.B: 4, 36, 37, 60. 
task: 22, 45, 47, 48. 
task-nr: 44, 46, 47, 48. 
teif: 7, 26, 28, 37. 


TEIF: 18, 28, 37, 67. 

TEIF.CYCLIC: 4, 26. 

TEIF.CYCLIC.B: 4. 

TEIF.DEADLINE: 4. 

TEIF.DEADLINE.B: 4. 

TEIF.TIMEOUT: 4, 18, 26, 60. 

TEIF.TIMEDUT.B: 4, 37, 40, 60. 

TEX: 5, 59. 

TEX First Activation: 5, 8. 

TEXGetNumTasks: L 8, M. 

TEXGetState: 5, 8, M. 

TEXGetTaskStatus: 5, 8, 59. 

TEXReboot: 31. 

TEXReceiveMessage: 5, 47, 51, 53, 55, 64, 66. 

TEXRestartTask: 5, 37, 59 . 

TEXSendMessage: 5, 17, 52, M, 63. 
TEXSetState: 5, 37, 

TEXStopTask: 5, 6, M. 
this: 24, 41. 
this-node: 45, 51 . 
timeout: 5, 52, 
timeout_t: 5, 7, 40, 61. 
to: 18. 

TOM: 5, 7, 52, 61, 62, 63, 65. 
tom: 7, 10, 11, 12, 13, 18, 20, 23, 25, 26, 27, 28, 
34, 35, 36, 37, 39, 40, 43, 
tom-close: 18. 

TDM_CYCLIC: 4. 

tom-declare: 11, 12, 13, 26, 35, 36, 40, 65. 
tom-delete: 27, 28, 37, 40, 68. 
tom-dump: 18, 35, 37. 
tom-id: 5. 

tom-init: 10, 34, 62. 

tom-insert: 11, 12, 13, 18, 23, 26, 35, 36, 65. 
tom-ispresent: 18, 37. 

TDM_MBDX: 4. 
tom_message_t: 5. 

TDM_NON_CYCLIC: 4, 11. 
tom-renew: 18, 20, 25, 37, 39, 43. 
tomsem: 5, 6. 

TDM_SET_ENABLE: 11, 12, 13, 26, 35, 36, 40, 65. 
top: 5, 52. 

TRUE: 18, 37. 

type: 5, 17, 18, 19, 24, 29, 30, 32, 37, 38, 41, 51, 
52, 54, 55, 62, 63, 64, 66, 67. 
updatc-nr: 49. 
used: 5. 

WITH: 18, 37, 54, 55. 
you: 67_- 
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( Ageilt-ReCOvery(sM6z(i) 29) Used in section 18. 

( Alarm function 52 ) Used in section 4. 

(Check who is the manager according to the RL script 58) Used in section 8. 

(DIR Print Message 60 ) Used in section 4. 

( DIR net backup agent 33 ) Used in section 6. 

( DIR net initialisation 8 ) Used in section 6. 

( DIR net manager 9 ) Used in section 6. 

(Generic component of the DIR net 6) Used in section 4. 

(Global Variables and # include’s 5) Used in section 4. 

( I’m Alive Task 61) Used in section 4. 

(lATask: wait for an incoming message 66) Used in section 62. 

(Manager-Recovery(?nana 3 enc?) 30 ) Used in section 37. 

( Node-ReCOVery(sM6*d) 31 ) Used in sections 18 and 37. 

(Read your role from the RL script 57 ) Used in section 8. 

(Spawn the I’m Alive Task 69) Used in section 6. 

(TEX routines simulated on EPX 59) Used in section 4. 

(The DIRAlive function 62) Used in section 61. 

(The lAlarm function 63) Used in section 61. 

(Variables local to the Manager and the Backup 7) Used in section 6. 

( activate lAT 17 ) Used in sections 9, 18, 33, and 37. 

(backup loop (waiting for incoming messages) 37 ) Used in section 33. 

(broadcast local database and receive the others’ databases 46) Used in section 8. 

( broadcast modifications to db 21 ) Used in sections 18 and 37. 

(broadcast the local part of the database 48) Used in section 46. 

( broadcast NIUA 24 ) Used in section 18. 

( build a global database 49 ) Used in section 8. 

( choose next manager 42 ) Used in section 37. 

( clear lA-flag I 6 ) Used in sections 9, 18, 33, and 37. 

( clear the suspicion-period [ ] ’s 14) Used in section 9. 

( clear suspicion-period 15 ) Used in section 33. 

( deal with these other messages 32 ) Used in sections 18 and 37. 

(declare and insert MIA timeouts 11) Used in section 10. 

(declare and insert TAIA timeouts 13) Used in section 10. 

(declare and insert the lA-flag timeout 12) Used in sections 10 and 34. 

(declare and insert the one MIA timeout 35 ) Used in section 34. 

(declare and insert the one TAIA timeout 36 ) Used in section 34. 

(delete MIA-timeout coming from subid, if any exists 40) Used in section 37. 

( delete timeout (lA-flag-timeOut) 68 ) Used in section 62. 

(delete timeout (TAIA-timeout, subid) 27 ) Used in section 18. 

(delete timeout (TEIF-timeout, subid) 28) Used in section 18. 

(deliver remote database 47) Used in section 46. 

(insert TAIA-timeout, subid= sttfezd 23 ) Used in section 18. 

(insert four timeouts (lA-flag, MIA, TAIA, and TEIF) 34) Used in section 33. 

(insert timeout (lA-flag-timeout, GYGLIG, subid) 65) Used in section 62. 

(insert timeouts (lA-flag-timeout, MIA-timeouts, TAIA-timeouts) 10) Used in section 9. 
(insert TEIF timeout 26 ) Used in sections 18 and 37. 

(manager loop (waiting for incoming messages) 18) Used in section 9. 

(mark as ‘reboot-resistant’ the whole database via DataReset () 50 ) Used in section 8. 

( renew lA-flag-timeOUt 43 ) Used in sections 18 and 37. 

( renew TAIA-timeOut, subid 25 ) Used in section 18. 

(renew MIA timeout on all subid^s 20 ) Used in section 18. 

( renew TAIA timeout 39 ) Used in section 37. 
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(request a copy of the global database 5i) Used in section 8. 
(send ANID to all except managerid 41) Used in section 37. 

( send MIA to backup subid 19 ) Used in section 18. 

( send TAIA to the manager 38 ) Used in section 37. 

( send TEIF to all except subid 67 ) Used in section 62. 

( send WITH to all 54 ) Used in section 8. 

(store in local database(i, status) 45) Used in section 8. 
(store number of tasks (n); 44) Used in section 8. 

( update your copy of the db 22 ) Used in sections 18 and 37. 
(wait for activation message 64) Used in section 62. 

( wait for an incoming message 53 ) Used in sections 18 and 37. 
(wait for NMI messages to come 55 ) Used in section 8. 

( GetState and SetState 56) Used in section 4. 
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